Skip to content

8353230: Emoji rendering regression after JDK-8208377 #24412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gredler
Copy link
Contributor

@gredler gredler commented Apr 3, 2025

It looks like this regression actually fits into a longer series of fixes / regressions in this area:

  • JDK-4517298 fixed metrics for zero-width characters, but broke some ligatures / glyph substitutions
  • JDK-7017058 fixed the ligatures / glyph substitutions, but broke some zero-width metrics
  • JDK-8208377 fixed some metrics and rendering for zero-width characters, but broke some ligatures / glyph substitutions
  • Now, with this PR, we aim to fix the ligatures without re-breaking zero-width metrics and display

We have two different types of use cases pulling CharToGlyphMapper in two different directions: the users who need raw, untransformed glyph info, and the users who need normalized / transformed glyph info.

It looks to me like, in the current code base, the only CharToGlyphMapper user which requires raw font data is HarfBuzz (explicitly confirmed with the HarfBuzz team here: harfbuzz/harfbuzz#5234).

The regression mechanism at play here is that the HarfBuzz font callbacks are currently providing HarfBuzz with transformed glyph info (e.g. ZWJ -> INVISIBLE_GLYPH_ID), which prevents HarfBuzz from recognizing and applying the correct font GSUB substitutions (which involve ZWJ).

In order to fix this without (yet again) breaking metrics and display behavior elsewhere, I've added two methods to CharToGlyphMapper which provide access to raw glyph info, to be used by the HarfBuzz font callbacks: charToGlyphRaw(int) and charToVariationGlyphRaw(int).

Note two intricacies related to CompositeGlyphMapper:

  1. We need to be careful to only cache raw (untransformed) values, to avoid conflicts between requests for a raw version of a glyph and a transformed version of the same glyph. Another option would have been two separate caches, but I don't think that's necessary.
  2. Consumers who are using CompositeGlyphMapper.SLOTMASK to check glyph slots (e.g. FontRunIterator and CTextPipe) will "see" invisible glyphs as having come from slot 0. This isn't new, and I think it's OK, but something to be aware of.

The glyph cache handling in CCharToGlyphMapper (for macOS) also requires care to avoid mixing value types.

Please also note that I'm not sure if the tweak to sunFont.c is being tested, since FFM is being used by default for HarfBuzz integration. (Is there a plan to remove the JNI version soon?)

This PR includes a self-contained regression test. It includes a small font created just for this test, which exercises the ligature / glyph substitution infrastructure. The font tests, including the new regression test, all pass locally on Linux, Windows and macOS (make test TEST="jtreg:test/jdk/java/awt/font").

Interestingly, the changes for JDK-7017058 (mentioned above) included a test (ZWJLigatureTest) which I think would have caught this last regression, but it depends on optional Windows fonts which I guess do not exist on any commonly-used test infrastructure. This should not be an issue with the new test, since it does not depend on any external fonts.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8353230: Emoji rendering regression after JDK-8208377 (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24412/head:pull/24412
$ git checkout pull/24412

Update a local copy of the PR:
$ git checkout pull/24412
$ git pull https://git.openjdk.org/jdk.git pull/24412/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24412

View PR using the GUI difftool:
$ git pr show -t 24412

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24412.diff

Using Webrev

Link to Webrev Comment

Sorry, something went wrong.

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 3, 2025

👋 Welcome back dgredler! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 3, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 3, 2025
@openjdk
Copy link

openjdk bot commented Apr 3, 2025

@gredler The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the client client-libs-dev@openjdk.org label Apr 3, 2025
@mlbridge
Copy link

mlbridge bot commented Apr 3, 2025

Webrevs

@YaaZ
Copy link
Member

YaaZ commented Apr 15, 2025

We had similar emoji-related regressions at JetBrains. Although our font-related code diverged from OpenJDK a bit, porting this patch seems to resolve them too. I am not an OpenJDK reviewer, but LGTM nevertheless.

@gredler
Copy link
Contributor Author

gredler commented Apr 21, 2025

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

@prrace
Copy link
Contributor

prrace commented Apr 24, 2025

@YaaZ Thanks for the information!

@prrace Have you had a chance to look at this PR?

It passed all the testing I did. I still need to look hard at the changes.

@YaaZ
Copy link
Member

YaaZ commented Apr 29, 2025

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion. Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something. Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?
Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

@gredler
Copy link
Contributor Author

gredler commented May 1, 2025

@YaaZ: Thanks for the additional feedback, please see my thoughts below:

By the way, I see that in each implementation, both charToGlyph and charToGlyphRaw call a common method, like getGlyph(int uniciode, boolean raw). At first there was just charToGlyph, then charToVariationGlyph was added and now you added a "raw" version for each of them, I see that in the future we will need other variants and how it's already starting an exponential explosion.

I don't know if I would call two changes to CharToGlyphMapper in 20 years an exponential explosion, but I get your point :-)

Overriding all of those methods in each implementation brings quite a bit of boilerplate, and it becomes easier to miss something.

True, but again keep in mind that there are only 5 implementations, only one of which (the macOS CCharToGlyphMapper) has been added in the last 20 years.

Maybe take a step back and refactor this into a single charToGlyph(int unicode, int variationSelector, boolean raw) version?

We'd still need separate methods for int vs. char, but I think this might reduce 5 methods down to 3? The changeset would be a bit more intrusive (lots of callers would need to change to reflect the new method signature). I'd be interested to hear thoughts from some of the reviewers on this one.

Also, this raw parameter only really controls isDefaultIgnorable check in the end of each method. Maybe we could factor this out without bringing it separately into each mapper implementation?

I prefer to think of it as controlling whether or not any transformations to INVISIBLE_GLYPH_ID happen (right now it's just for default-ignorable characters, but there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Any ideas for what this refactoring might look like?

@YaaZ
Copy link
Member

YaaZ commented May 1, 2025

I was talking about the explosion because there is a scenario in my mind, which I didn't make clear for everybody else. There is a change which I didn't have time to contribute, but would like to: it's related to composite fonts and variation selectors. We may need 2 variants for retrieving a glyph with a variation selector - one strictly matching a variation selector and another with a fallback to the base glyph, multiplied by raw/transformed versions, which adds 2 more methods. Not like it's a big problem, but given that they all end up calling a single method anyway... You get the point.

there may be other scenarios in the future, e.g. \r, \n and \t which currently are handled elsewhere).

Are those scenarios specific to a patricular mapper/font type? I was thinking that those transformations are generic.

Any ideas for what this refactoring might look like?

I was thinking about moving this default-ignorable or any potential generic transformation into base CharToGlyphMapper or even Font2D. For example, make default implementation of CharToGlyphMapper.charToGlyph check ignorable characters and then call charToGlyphRaw - then other implementations would only need to override charToGlyphRaw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client client-libs-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

None yet

3 participants